Skip to content

Lead ClickHouse audit sort key with tenant + LowCardinality columns#126

Merged
lohanidamodar merged 1 commit into
mainfrom
feat/tenant-time-projection
Jun 22, 2026
Merged

Lead ClickHouse audit sort key with tenant + LowCardinality columns#126
lohanidamodar merged 1 commit into
mainfrom
feat/tenant-time-projection

Conversation

@lohanidamodar

@lohanidamodar lohanidamodar commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What

Leads the ClickHouse audit table sort key with tenant and stores genuinely
low-cardinality columns as LowCardinality.

New shared audit tables are now created natively with:

  • ORDER BY (tenant, time, id) plus allow_nullable_key = 1, so tenant-scoped
    time-range reads prune by tenant first. Single-tenant (non-shared) tables keep
    the historical (time, id) key.
  • event, actorType, resourceType as LowCardinality(String) and country
    as LowCardinality(Nullable(String)) — smaller storage, faster GROUP BY /
    equality scans. Reads and writes are unchanged.
  • The full standard secondary-index set, unchanged.

Scope

This PR is fresh schema only. Because new tables are created with the
tenant-leading key natively, they never need a projection.

Migrating a pre-existing (time, id) table — adding and materializing the
p_tenant_time projection, converting columns to LowCardinality, and dropping
the now-redundant bloom indexes — is an out-of-band operator step, run
externally. It is intentionally not part of the library, so setup() stays
safe to call on every boot.

Tests

A live integration test (testSharedTableSortKeyLeadsWithTenant) runs setup()
against the ClickHouse service in shared-tables mode under a unique namespace,
introspects system.tables.sorting_key, and asserts the key leads with
tenant. It cleans up its table afterwards so it is repeatable.

@lohanidamodar lohanidamodar force-pushed the feat/tenant-time-projection branch from 32ed906 to 35c9cc5 Compare June 22, 2026 02:06
@greptile-apps

greptile-apps Bot commented Jun 22, 2026

Copy link
Copy Markdown

Greptile Summary

  • Updates fresh ClickHouse audit table DDL so shared tables use ORDER BY (tenant, time, id) while single-tenant tables keep ORDER BY (time, id).
  • Converts selected low-cardinality audit fields to ClickHouse LowCardinality column types.
  • Adds a live ClickHouse integration test that verifies shared-table sorting keys lead with tenant.

Confidence Score: 4/5

The schema changes are focused, but the new live test can reject a valid tenant-leading ClickHouse sort key format.

The main implementation intent is clear and narrowly scoped, with attention needed on the assertion logic for ClickHouse's reported sorting key representation.

tests/Audit/Adapter/ClickHouseTest.php

T-Rex T-Rex Logs

What T-Rex did

  • T-Rex attempted to start the repository ClickHouse service with docker compose, but the environment has no docker executable.
  • T-Rex created a minimal PHP harness for the exact sorting_key assertion using a (tenant, time, id) tuple and the starts_with(trim($sortingKey), 'tenant') check.
  • T-Rex attempted to run the PHP harness, but the environment has no PHP executable to run it.
  • T-Rex ran the base validation command sequence and encountered blockers: php: command not found, composer: command not found, and ClickHouse connection refused.
  • T-Rex then ran the head validation command sequence and faced the same blockers.
  • Because the changed code path could not be executed against a real/local ClickHouse service, no PASS/FAIL determination could be made for the PR contract.
  • T-Rex checked out the base branch; docker, composer, and PHP were not found and the command exited with status 127.
  • T-Rex checked out the head branch; docker, composer, and PHP were not found and the command exited with status 127.
  • Because ClickHouse was not started and PHP was not available, system.columns and read/write checks were not executed.
  • T-Rex inspected the nonshared-sort-key-01-before.log to confirm the Before state, including the fallback command, revision, unavailable runtime tools, and the base line SETTINGS.
  • T-Rex inspected the nonshared-sort-key-02-after.log to confirm the After state, showing head resolves to (time, id) and the allow_nullable_key gating logic.

View all artifacts

T-Rex Ran code and verified through T-Rex

Reviews (4): Last reviewed commit: "Lead ClickHouse audit sort key with tena..." | Re-trigger Greptile

Comment thread src/Audit/Adapter/ClickHouse.php Outdated
Comment thread src/Audit/Adapter/ClickHouse.php Outdated
Comment thread src/Audit/Adapter/ClickHouse.php Outdated
@lohanidamodar lohanidamodar force-pushed the feat/tenant-time-projection branch 2 times, most recently from 4e24c39 to 27f1dc4 Compare June 22, 2026 02:22
@lohanidamodar lohanidamodar changed the title Lead ClickHouse audit sort key with tenant + projection for existing tables Lead ClickHouse audit sort key with tenant + LowCardinality columns Jun 22, 2026
@lohanidamodar lohanidamodar force-pushed the feat/tenant-time-projection branch 3 times, most recently from 3a5ee07 to b77d2fb Compare June 22, 2026 02:47
Comment thread tests/Audit/Adapter/ClickHouseTest.php Outdated
Comment thread tests/Audit/Adapter/ClickHouseTest.php Outdated
New shared audit tables are created natively with ORDER BY (tenant, time, id)
plus allow_nullable_key, so tenant-scoped time-range reads prune by tenant
first. Single-tenant tables keep the historical (time, id) key.

event, actorType, resourceType and country are now LowCardinality columns for
smaller storage and faster scans (country wrapped as LowCardinality(Nullable)).
The full standard secondary-index set is retained.

Migrating a pre-existing (time, id) table (adding/materializing the projection,
converting columns to LowCardinality, dropping redundant indexes) is an
out-of-band operator step, NOT part of the library.

The tenant-leading sort key is covered by a live integration test that runs
setup() against the ClickHouse service and asserts system.tables.sorting_key
leads with tenant.
@lohanidamodar lohanidamodar force-pushed the feat/tenant-time-projection branch from b77d2fb to 3f2c1ce Compare June 22, 2026 03:06
Comment on lines +991 to +994
$this->assertTrue(
str_starts_with(trim($sortingKey), 'tenant'),
"Expected sorting key to lead with 'tenant', got: {$sortingKey}"
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Normalize sorting key

ClickHouse reports multi-column system.tables.sorting_key values in tuple form, such as (tenant, time, id). With that format, this assertion fails even when setup() created the correct tenant-leading key, so the new live test can reject a valid schema.

@lohanidamodar lohanidamodar merged commit f5bb078 into main Jun 22, 2026
4 checks passed
@lohanidamodar lohanidamodar deleted the feat/tenant-time-projection branch June 22, 2026 03:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant